Point clouds are characterized by irregularity and unstructuredness, which pose challenges in efficient data exploitation and discriminative feature extraction. In this paper, we present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology as a completely regular 2D point geometry image (PGI) structure, in which coordinates of spatial points are captured in colors of image pixels. \mr{Intuitively, Flattening-Net implicitly approximates a locally smooth 3D-to-2D surface flattening process while effectively preserving neighborhood consistency.} \mr{As a generic representation modality, PGI inherently encodes the intrinsic property of the underlying manifold structure and facilitates surface-style point feature aggregation.} To demonstrate its potential, we construct a unified learning framework directly operating on PGIs to achieve \mr{diverse types of high-level and low-level} downstream applications driven by specific task networks, including classification, segmentation, reconstruction, and upsampling. Extensive experiments demonstrate that our methods perform favorably against the current state-of-the-art competitors. We will make the code and data publicly available at https://github.com/keeganhk/Flattening-Net.
translated by 谷歌翻译
Learning effective joint embedding for cross-modal data has always been a focus in the field of multimodal machine learning. We argue that during multimodal fusion, the generated multimodal embedding may be redundant, and the discriminative unimodal information may be ignored, which often interferes with accurate prediction and leads to a higher risk of overfitting. Moreover, unimodal representations also contain noisy information that negatively influences the learning of cross-modal dynamics. To this end, we introduce the multimodal information bottleneck (MIB), aiming to learn a powerful and sufficient multimodal representation that is free of redundancy and to filter out noisy information in unimodal representations. Specifically, inheriting from the general information bottleneck (IB), MIB aims to learn the minimal sufficient representation for a given task by maximizing the mutual information between the representation and the target and simultaneously constraining the mutual information between the representation and the input data. Different from general IB, our MIB regularizes both the multimodal and unimodal representations, which is a comprehensive and flexible framework that is compatible with any fusion methods. We develop three MIB variants, namely, early-fusion MIB, late-fusion MIB, and complete MIB, to focus on different perspectives of information constraints. Experimental results suggest that the proposed method reaches state-of-the-art performance on the tasks of multimodal sentiment analysis and multimodal emotion recognition across three widely used datasets. The codes are available at \url{https://github.com/TmacMai/Multimodal-Information-Bottleneck}.
translated by 谷歌翻译
在恶劣天气下的图像修复是一项艰巨的任务。过去的大多数作品都集中在消除图像中的雨水和阴霾现象。但是,雪也是一种极为普遍的大气现象,它将严重影响高级计算机视觉任务的性能,例如对象检测和语义分割。最近,已经提出了一些用于降雪的方法,大多数方法直接将雪图像作为优化对象。但是,雪地点和形状的分布很复杂。因此,未能有效地检测雪花 /雪连胜将影响降雪并限制模型性能。为了解决这些问题,我们提出了一个雪地掩模的自适应残留网络(SMGARN)。具体而言,SMGARN由三个部分组成,即Mask-Net,Guidance-Fusion Network(GF-NET)和重建-NET。首先,我们构建了一个以自像素的注意(SA)和跨像素的注意(CA),以捕获雪花的特征并准确地定位了雪的位置,从而预测了准确的雪山。其次,预测的雪面被发送到专门设计的GF-NET中,以适应指导模型去除雪。最后,使用有效的重建网络来消除面纱效果并纠正图像以重建最终的无雪图像。广泛的实验表明,我们的SMGARN数值优于所有现有的降雪方法,并且重建的图像在视觉对比度上更清晰。所有代码都将可用。
translated by 谷歌翻译
旨在用自然语言和谐地与人类交流的智能对话体系对于促进人工智能时代的人机互动的发展非常出色。有了逐渐复杂的人类计算机交互要求(例如,多模式输入,时间敏感性),传统的基于文本的对话系统很难满足对更加生动和方便的交互的需求。因此,视觉背景增强对话系统(VAD)有可能通过感知和理解多模式信息(即图像或视频中的视觉上下文,文本对话历史记录)与人类进行交流,已成为主要的研究范式。 VAD受益于视觉和文本上下文之间的一致性和互补性,具有产生引人入胜和背景感知响应的潜力。为了描述VAD的开发,我们首先表征VAD的概念和独特功能,然后介绍其通用系统体系结构以说明系统工作流程。随后,对一些研究挑战和代表性作品进行了详细研究,然后进行了权威基准摘要。我们通过提出一些开放问题和有前途的VAD研究趋势来结束本文,例如,在跨模式对话环境下,人机对话的认知机制以及知识增强的跨模式语义互动。
translated by 谷歌翻译
多尺度体系结构和注意力模块在许多基于深度学习的图像脱落方法中都显示出有效性。但是,将这两个组件手动设计和集成到神经网络中需要大量的劳动力和广泛的专业知识。在本文中,高性能多尺度的细心神经体系结构搜索(MANAS)框架是技术开发的。所提出的方法为图像脱落任务的最爱的多个灵活模块制定了新的多尺度注意搜索空间。在搜索空间下,建立了多尺度的细胞,该单元被进一步用于构建功能强大的图像脱落网络。通过基于梯度的搜索算法自动搜索脱毛网络的内部多尺度架构,该算法在某种程度上避免了手动设计的艰巨过程。此外,为了获得强大的图像脱落模型,还提出了一种实用有效的多到一对训练策略,以允许去磨损网络从具有相同背景场景的多个雨天图像中获取足够的背景信息,与此同时,共同优化了包括外部损失,内部损失,建筑正则损失和模型复杂性损失在内的多个损失功能,以实现可靠的损伤性能和可控的模型复杂性。对合成和逼真的雨图像以及下游视觉应用(即反对检测和分割)的广泛实验结果始终证明了我们提出的方法的优越性。
translated by 谷歌翻译
互联网技术的发展不断增强谣言和虚假新闻的传播和破坏力。先前关于多媒体假新闻检测的研究包括一系列复杂的功能提取和融合网络,以实现图像和文本之间的特征对齐。但是,多模式功能由什么组成,以及来自不同模式的特征如何影响决策过程仍然是开放的问题。我们介绍了Aura,这是一个具有自适应单峰表示聚合的多模式假新闻检测网络。我们首先从图像模式,图像语义和文本中分别提取表示形式,并通过将语义和语言表示形式发送到专家网络来生成多模式表示。然后,我们根据单峰和多模式表示,进行粗级的虚假新闻检测和跨模式宇宙性学习。分类和一致性得分被映射到模态感知的注意分数,以重新调整功能。最后,我们汇总并将加权功能分类用于精制的假新闻检测。关于微博和八卦的综合实验证明,Aura可以成功击败几个最先进的FND方案,在该方案中,整体预测准确性和对假新闻的回忆得到稳步改善。
translated by 谷歌翻译
深度学习在各种工业应用中取得了巨大成功。公司不希望他们的宝贵数据被恶意员工偷来培训盗版模式。他们也不希望竞争对手在线使用后分析的数据。我们提出了一种新的解决方案,在这种情况下,通过稳健地并可逆地将图像转换为对手图像。我们开发一个可逆的对抗性示例生成器(Raeg),对图像引入略微变化以欺骗传统的分类模型。尽管恶意攻击培训基于Deacened版本的受保护图像的盗版模型,但Raeg可以显着削弱这些模型的功能。同时,Raeg的可逆性确保了授权模型的表现。广泛的实验表明,Raeg可以通过比以前的方法更好地防止对抗对抗防御的轻微扭曲。
translated by 谷歌翻译
Increasing research interests focus on sequential recommender systems, aiming to model dynamic sequence representation precisely. However, the most commonly used loss function in state-of-the-art sequential recommendation models has essential limitations. To name a few, Bayesian Personalized Ranking (BPR) loss suffers the vanishing gradient problem from numerous negative sampling and predictionbiases; Binary Cross-Entropy (BCE) loss subjects to negative sampling numbers, thereby it is likely to ignore valuable negative examples and reduce the training efficiency; Cross-Entropy (CE) loss only focuses on the last timestamp of the training sequence, which causes low utilization of sequence information and results in inferior user sequence representation. To avoid these limitations, in this paper, we propose to calculate Cumulative Cross-Entropy (CCE) loss over the sequence. CCE is simple and direct, which enjoys the virtues of painless deployment, no negative sampling, and effective and efficient training. We conduct extensive experiments on five benchmark datasets to demonstrate the effectiveness and efficiency of CCE. The results show that employing CCE loss on three state-of-the-art models GRU4Rec, SASRec, and S3-Rec can reach 125.63%, 69.90%, and 33.24% average improvement of full ranking NDCG@5, respectively. Using CCE, the performance curve of the models on the test data increases rapidly with the wall clock time, and is superior to that of other loss functions in almost the whole process of model training.
translated by 谷歌翻译
Decompilation aims to transform a low-level program language (LPL) (eg., binary file) into its functionally-equivalent high-level program language (HPL) (e.g., C/C++). It is a core technology in software security, especially in vulnerability discovery and malware analysis. In recent years, with the successful application of neural machine translation (NMT) models in natural language processing (NLP), researchers have tried to build neural decompilers by borrowing the idea of NMT. They formulate the decompilation process as a translation problem between LPL and HPL, aiming to reduce the human cost required to develop decompilation tools and improve their generalizability. However, state-of-the-art learning-based decompilers do not cope well with compiler-optimized binaries. Since real-world binaries are mostly compiler-optimized, decompilers that do not consider optimized binaries have limited practical significance. In this paper, we propose a novel learning-based approach named NeurDP, that targets compiler-optimized binaries. NeurDP uses a graph neural network (GNN) model to convert LPL to an intermediate representation (IR), which bridges the gap between source code and optimized binary. We also design an Optimized Translation Unit (OTU) to split functions into smaller code fragments for better translation performance. Evaluation results on datasets containing various types of statements show that NeurDP can decompile optimized binaries with 45.21% higher accuracy than state-of-the-art neural decompilation frameworks.
translated by 谷歌翻译
Recently the deep learning has shown its advantage in representation learning and clustering for time series data. Despite the considerable progress, the existing deep time series clustering approaches mostly seek to train the deep neural network by some instance reconstruction based or cluster distribution based objective, which, however, lack the ability to exploit the sample-wise (or augmentation-wise) contrastive information or even the higher-level (e.g., cluster-level) contrastiveness for learning discriminative and clustering-friendly representations. In light of this, this paper presents a deep temporal contrastive clustering (DTCC) approach, which for the first time, to our knowledge, incorporates the contrastive learning paradigm into the deep time series clustering research. Specifically, with two parallel views generated from the original time series and their augmentations, we utilize two identical auto-encoders to learn the corresponding representations, and in the meantime perform the cluster distribution learning by incorporating a k-means objective. Further, two levels of contrastive learning are simultaneously enforced to capture the instance-level and cluster-level contrastive information, respectively. With the reconstruction loss of the auto-encoder, the cluster distribution loss, and the two levels of contrastive losses jointly optimized, the network architecture is trained in a self-supervised manner and the clustering result can thereby be obtained. Experiments on a variety of time series datasets demonstrate the superiority of our DTCC approach over the state-of-the-art.
translated by 谷歌翻译